Search Result

Select

Communication code generation for automatic parallelization of irregular loops

FU LiGuo YAO Yuan DING Rui

Journal of Computer Applications 2014, 34 (4): 1014-1018. DOI: 10.11772/j.issn.1001-9081.2014.04.1014

Abstract （370）

PDF （791KB）（371）

Save

Irregular computing exists in large scale parallel application widely and the automatic parallelization on distributed memory is hardly to generate parallel code for irregular loops at compile-time. The communication code of the parallel code influences the correctness and the efficiency to the runout of the program. It could automatically generate useful communication code for a common class of irregular loops at compile-time by using the approach of partial communication redundancy, that needed analyzing the array redistribution graph of the program to maintain the producer-consumer relation of irregular array references. The approach searched the local definition set of the irregular array on each processor by computation decomposition and accessed expression of array references as the communication data set, then analyzed the communication strategies for such irregular loops and generated the corresponding communication code. The experimental results show the validity of the approach and the expectant speedup of test applications.

Reference | Related Articles | Metrics

Select

Parallel cost model for heterogeneous multi-core processors

HUANG Pinfeng ZHAO Rongcai YAO Yuan ZHAO Jie

Journal of Computer Applications 2013, 33 (06): 1544-1547. DOI: 10.3724/SP.J.1087.2013.01544

Abstract （640）

PDF （634KB）（766）

Save

The existing parallel cost models are mostly devised for shared memory or distributed memory architecture, thus not suitable for heterogeneous multi-core processors. In order to solve the problem, a new parallel cost model for heterogeneous multi-cores was proposed. It described the impact of computing capacity, memory access delay and data transfer cost on parallel execution time of loops quantitatively, thus improving the veracity of accelerated parallel loop recognition. The experimental results show that the proposed model can effectively recognize the accelerated parallel loops. Using its recognition results to generate parallel codes can improve the performance of parallel programs on heterogeneous multi-core processors significantly.

Reference | Related Articles | Metrics

Select

Superword level parallelism instruction analysis and redundancy optimization algorithm on DSP

SUO Wei-yi ZHAO Rong-cai YAO Yuan LIU Peng

Journal of Computer Applications 2012, 32 (12): 3303-3307. DOI: 10.3724/SP.J.1087.2012.03303

Abstract （978）

PDF （760KB）（578）

Save

Today, SIMD (Single Instruction Multiple Data) technology has been widely used in Digital Signal Processor (DSP), and most of the existing compilers realize automatic vectorization functions. However，the compiler cannot support SIMD auto-vectorization with the feature of DSP, because of DSP complex instruction set, the specific addressing model, the obstacle of dependence relation to vectorization non-aligned data or other reasons. In order to solve this problem, in this paper, for the automatic vectorization in the Superword Level Parallelism (SLP) based on the Open64 compiler back end, the instruction analysis and redundancy optimization algorithm were improved, so as to transform more efficient vectorized source program. The experimental results show that the proposed method can improve DSP performances and reduce power consumption efficiently.